LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities

نویسندگان

  • David Vidal
  • Michael Thormann
  • Miquel Pons
چکیده

SMILES strings are the most compact text based molecular representations. Implicitly they contain the information needed to compute all kinds of molecular structures and, thus, molecular properties derived from these structures. We show that this implicit information can be accessed directly at SMILES string level without the need to apply explicit time-consuming conversion of the SMILES strings into molecular graphs or 3D structures with subsequent 2D or 3D QSPR calculations. Our method is based on the fragmentation of SMILES strings into overlapping substrings of a defined size that we call LINGOs. The integral set of LINGOs derived from a given SMILES string, the LINGO profile, is a hologram of the SMILES representation of the molecule described. LINGO profiles provide input for QSPR models and the calculation of intermolecular similarities at very low computational cost. The octanol/water partition coefficient (LlogP) QSPR model achieved a correlation coefficient R2=0.93, a root-mean-square error RRMS=0.49 log units, a goodness of prediction correlation coefficient Q2=0.89 and a QRMS=0.61 log units. The intrinsic aqueous solubility (LlogS) QSPR model achieved correlation coefficient values of R2=0.91, Q2=0.82, and RRMS=0.60 and QRMS=0.89 log units. Integral Tanimoto coefficients computed from LINGO profiles provided sharp discrimination between random and bioisoster pairs extracted from Accelrys Bioster Database. Average similarities (LINGOsim) were 0.07 for the random pairs and 0.36 for the bioisosteric pairs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs

LINGOs are a holographic measure of chemical similarity based on text comparison of SMILES strings. We present a new algorithm for calculating LINGO similarities amenable to parallelization on SIMD architectures (such as GPUs and vector units of modern CPUs). We show that it is nearly 3x as fast as existing algorithms on a CPU, and over 80x faster than existing methods when run on a GPU.

متن کامل

Semi Empirical Calculation of Intermolecular Potentials and Transport Properties of Some Binary and Ternary Industrial Refrigerant Mixtures

In this study the intermolecular potential energies of some environment-friendly industrial HFC refrigerants were obtained through the inversion method which is based on the corresponding states principle. These potentials were later employed in calculation of transport properties (viscosity, diffusion, thermal conductivity and thermal diffusion factor) of some binary and ternary refrigerant mi...

متن کامل

Electrically switchable cylindrical Fresnel lens based on holographic polymer-dispersed liquid crystals using a Michelson interferometer

Fabricating an electrically switchable cylindrical Fresnel lens based on holographic polymer-dispersed liquid crystals (H-PDLC) using a Michelson interferometer is reported. Simplicity of the method and possibility of fabricating different focal length lenses in a single set up are among the advantages of the method. It is demonstrated that the Fresnel structured zone plate acts as a cylindrica...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 45 2  شماره 

صفحات  -

تاریخ انتشار 2005